Search CORE

21 research outputs found

Hand2Face: Automatic Synthesis and Recognition of Hand Over Face Occlusions

Author: Baltrusaitis Tadas
Hughes Charles. E.
Morency Louis-philippe
Nojavanasghari Behnaz
Publication venue
Publication date: 16/08/2017
Field of study

A person's face discloses important information about their affective state. Although there has been extensive research on recognition of facial expressions, the performance of existing approaches is challenged by facial occlusions. Facial occlusions are often treated as noise and discarded in recognition of affective states. However, hand over face occlusions can provide additional information for recognition of some affective states such as curiosity, frustration and boredom. One of the reasons that this problem has not gained attention is the lack of naturalistic occluded faces that contain hand over face occlusions as well as other types of occlusions. Traditional approaches for obtaining affective data are time demanding and expensive, which limits researchers in affective computing to work on small datasets. This limitation affects the generalizability of models and deprives researchers from taking advantage of recent advances in deep learning that have shown great success in many fields but require large volumes of data. In this paper, we first introduce a novel framework for synthesizing naturalistic facial occlusions from an initial dataset of non-occluded faces and separate images of hands, reducing the costly process of data collection and annotation. We then propose a model for facial occlusion type recognition to differentiate between hand over face occlusions and other types of occlusions such as scarves, hair, glasses and objects. Finally, we present a model to localize hand over face occlusions and identify the occluded regions of the face.Comment: Accepted to International Conference on Affective Computing and Intelligent Interaction (ACII), 201

arXiv.org e-Print Archive

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

OpenFace: An open source facial behavior analysis toolkit

Author: Baltrusaitis Tadas
Morency Louis-Philippe
Robinson Peter
Publication venue: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV)
Publication date: 01/03/2016
Field of study

Over the past few years, there has been an increased interest in automatic facial behavior analysis and understanding. We present OpenFace – an open source tool intended for computer vision and machine learning researchers, affective computing community and people interested in building interactive applications based on facial behavior analysis. OpenFace is the first open source tool capable of facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation. The computer vision algorithms which represent the core of OpenFace demonstrate state-of-the-art results in all of the above mentioned tasks. Furthermore, our tool is capable of real-time performance and is able to run from a simple webcam without any specialist hardware. Finally, OpenFace allows for easy integration with other applications and devices through a lightweight messaging system.European Community Seventh Framework Programme (FP7/2007-2013) under grant agreement No. 289021 (ASC-Inclusion)

Crossref

Apollo (Cambridge)

Video-based sympathetic arousal assessment via peripheral blood flow estimation

Author: Baltrusaitis Tadas
Braun Bjoern
Holz Christian
McDuff Daniel
Publication venue
Publication date: 12/11/2023
Field of study

Electrodermal activity (EDA) is considered a standard marker of sympathetic activity. However, traditional EDA measurement requires electrodes in steady contact with the skin. Can sympathetic arousal be measured using only an optical sensor, such as an RGB camera? This paper presents a novel approach to infer sympathetic arousal by measuring the peripheral blood flow on the face or hand optically. We contribute a self-recorded dataset of 21 participants, comprising synchronized videos of participants' faces and palms and gold-standard EDA and photoplethysmography (PPG) signals. Our results show that we can measure peripheral sympathetic responses that closely correlate with the ground truth EDA. We obtain median correlations of 0.57 to 0.63 between our inferred signals and the ground truth EDA using only videos of the participants' palms or foreheads or PPG signals from the foreheads or fingers. We also show that sympathetic arousal is best inferred from the forehead, finger, or palm.Comment: Accepted and to be published at Biomedical Optics Expres

arXiv.org e-Print Archive

A facial affect mapping engine

Author: Baltrusaitis Tadas
Impett Leonardo
Robinson Peter
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/11/2017
Field of study

Facial expressions play a crucial role in human interaction. Interactive digital games can help teaching people to both express and recognise them. Such interactive games can benefit from the ability to alter user expressions dynamically and in real-time. In this demonstration, we present the Facial Affect Mapping Engine (FAME), a framework for mapping and manipulating facial expressions across images and video streams. Our system is fully automatic runs in real-time and does not require any specialist hardware. FAME presents new possibilities for the designers of intelligent interactive digital games

Infoscience - École polytechnique fédérale de Lausanne

Real-Time Inference of Mental States from Facial Expressions and Upper Body Gestures

Author: Baltrusaitis Tadas
Banda Ntombikayise
el Kaliouby Rana
Mahmoud Marwa
McDuff Daniel Jonathan
Picard Rosalind W.
Robinson Peter
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/2011
Field of study

We present a real-time system for detecting facial action units and inferring emotional states from head and shoulder gestures and facial expressions. The dynamic system uses three levels of inference on progressively longer time scales. Firstly, facial action units and head orientation are identified from 22 feature points and Gabor filters. Secondly, Hidden Markov Models are used to classify sequences of actions into head and shoulder gestures. Finally, a multi level Dynamic Bayesian Network is used to model the unfolding emotional state based on probabilities of different gestures. The most probable state over a given video clip is chosen as the label for that clip. The average F1 score for 12 action units (AUs 1, 2, 4, 6, 7, 10, 12, 15, 17, 18, 25, 26), labelled on a frame by frame basis, was 0.461. The average classification rate for five emotional states (anger, fear, joy, relief, sadness) was 0.440. Sadness had the greatest rate, 0.64, anger the smallest, 0.11.Thales Research and Technology (UK)Bradlow Foundation TrustProcter & Gamble Compan

DSpace@MIT

Crossref

Enlighten

Rodin: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion

Author: Baltrusaitis Tadas
Bao Jianmin
Chen Dong
Chen Qifeng
Gu Shuyang
Guo Baining
Shen Jingjing
Wang Tengfei
Wen Fang
Zhang Bo
Zhang Ting
Publication venue
Publication date: 12/12/2022
Field of study

This paper presents a 3D generative model that uses diffusion models to automatically generate 3D digital avatars represented as neural radiance fields. A significant challenge in generating such avatars is that the memory and processing costs in 3D are prohibitive for producing the rich details required for high-quality avatars. To tackle this problem we propose the roll-out diffusion network (Rodin), which represents a neural radiance field as multiple 2D feature maps and rolls out these maps into a single 2D feature plane within which we perform 3D-aware diffusion. The Rodin model brings the much-needed computational efficiency while preserving the integrity of diffusion in 3D by using 3D-aware convolution that attends to projected features in the 2D feature plane according to their original relationship in 3D. We also use latent conditioning to orchestrate the feature generation for global coherence, leading to high-fidelity avatars and enabling their semantic editing based on text prompts. Finally, we use hierarchical synthesis to further enhance details. The 3D avatars generated by our model compare favorably with those produced by existing generative techniques. We can generate highly detailed avatars with realistic hairstyles and facial hair like beards. We also demonstrate 3D avatar generation from image or text as well as text-guided editability.Comment: Project Webpage: https://3d-avatar-diffusion.microsoft.com

arXiv.org e-Print Archive

Recommended from our members

CAM3D

Author: Anis Marwa
Baltrusaitis Tadas
Robinson Peter
Publication venue
Publication date: 09/04/2019
Field of study

Cam3D consists of 108 labelled videos of 12 mental states including spontaneous facial expressions and hand gestures. It was labelled using crowd-sourcing (inter-rater reliability Κ=0.45). We used three different sensors for data collection: Microsoft Kinect sensors, HD cameras, and microphones in the HD cameras. After the initial data collection, the videos were segmented. Each segment showed a single event such as a change in facial expression, head and body posture movement or hand gesture. From videos with public consent, a total of 451 segments were collected. The mean duration is 6 seconds. Labelling was based on context-free observer judgment. Public segments were labelled by community crowd-sourcing. Out of the 451 segmented videos we wanted to extract the ones that can reliably be described as belonging to one of the 24 emotion groups from the Baron-Cohen taxonomy. From the 2916 labels collected, 122 did not appear in the taxonomy so were not considered in the analysis. The remaining 2794 labels were grouped as belonging to one of the 24 groups plus agreement, disagreement, and neutral. To fi lter out non-emotional segments we chose only the videos that 60% or more of the raters agreed on. This resulted in 108 segments in total. The most common label given to a video segment was considered as the ground truth. The data is categorized by the ground-truth label and divided into seven folders. For each video segment, we provide the colour video, camera parameters, colour images and their corresponding aligned depth images

Apollo (Cambridge)

Crowdsouring in Emotion Studies Across Time and Culture

Author: Baltrusaitis Tadas
Mahmoud Marwa M.
Robinson Peter
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/10/2012
Field of study

Crowdsourcing is becoming increasingly popular as a cheap and effective tool for multimedia annotation. However, the idea is not new, and can be traced back to Charles Darwin. He was interested in studying the universality of facial expressions in conveying emotions, thus he had to consider a global population. Access to different cultures allowed him to reach more general conclusions. In this paper, we highlight a few milestones in the history of the study of emotion that share the concepts of crowdsourcing. We first consider the study of posed photographs and then move to videos of natural expressions. We present our use of crowdsouring to label a video corpus of natural expressions, and also to recreate one of Darwin's original emotion judgment experiments. This allows us to compare people's perception of emotional expressions in the 19th and 21st centuries, showing that it remains stable through both culture and time

Enlighten

Multimodal Machine Learning: A Survey and Taxonomy

Author: Chaitanya Ahuja
Louis-Philippe Morency
Tadas Baltrusaitis
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref